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Abstract 

We study the computational power of polynomial threshold functions, that is, 
threshold functions of real polynomials over the boolean cube. We provide two new 
results bounding the computational power of this model. 

Our first result shows that low-degree polynomial threshold functions cannot ap- 
proximate any function with many influential variables. We provide a couple of exam- 
ples where this technique yields tight approximation bounds. 

Our second result relates to constructing pseudorandom generators fooling low- 
degree polynomial threshold functions. This problem has received attention recently, 
where Diakonikolas et al [13] proved that fc-wise independence suffices to fool linear 
threshold functions. We prove that any low-degree polynomial threshold function, 
which can be represented as a function of a small number of linear threshold functions, 
can also be fooled by k-wise independence. We view this as an important step towards 
fooling general polynomial threshold functions, and we discuss a plausible approach 
achieving this goal based on our techniques. 

Our results combine tools from real approximation theory, hyper-contractive in- 
equalities and probabilistic methods. In particular, we develop several new tools in 
approximation theory which may be of independent interest. 
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1 Introduction 



A boolean function h : { — 1,1} 
/:{-l,l}»-Rif 



n 



{ — 1, 1} is a threshold (or sign) function of a real function 



h(x ly ...,!„) = sgn(/(xi, . . . , x n )). 



In this work we study thresholds of low-degree polynomials, or Polynomial Threshold Func- 
tions (PTFs). There is a long line of research that study the case of linear functions, i.e. 
degree 1 polynomials, which are commonly called Linear Threshold Functions (LTFs), or 
half spaces (see, e.g., [T8l [8j [13] and their references within). A key example for an LTF is 
the majority function which can be defined as 



The main challenge that we tackle in our work is bounding the computational power of 
low-degree PTFs. We consider two main problems. Constructing explicit pseudorandom 
distributions that fool low-degree PTFs, and providing lower bounds for the computation 
and approximation capabilities of PTFs. 

Pseudorandom generators for PTFs An important question is whether fc-wise inde- 
pendence fools PTFs for small values of k. In particular it is interesting whether k can be 
independent of the number of variables n. 

A boolean function h : { — 1,1}™ — > { — 1,1} is e-fooled by /c-wise independence if for any 
fc-wise independent distribution K taking values in { — 1, 1}™ we have 



where U denotes the uniform distribution over {—1, l} n . We say that a fc-wise independence 
fools degree-c? polynomials if it fools any threshold function h(x) = sgn(/(x) — t) for t G R), 
for any degree-e? real polynomial. This notion can be extended to fooling real functions. 

The problem of whether k-wise independence fools LTFs was first addressed by Benjamini 
et al. [8], who proved that fc-wise independence fools the majority function, and subsequently 
by Diakonikolas et al. [13] who proved that fc-wise independence fools LTFs. In both cases 
k = polylog(£:) • e~ 2 was required to achieve error e. 

Our first result extends the result of Diakonikolas et al. [13] to thresholds of low-degree 
polynomials which depend on a small number of linear functions. We see it as an important 
step towards building pseudorandom generators fooling general PTFs. For a real polynomial 
p( x ) = J^Pi Yliei x i define its weight as the sum of the absolute values of the coefficients, 
excluding the constant coefficient, that is 



Maj(xx, ...,x n )= sgn(xi + . . . + x. 



\n/2]). 



F xeK [h(x) = 1] - ¥ xeu [h(x) = 1]| < e, 
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Theorem 1. Let f : {— l,l} n — > M fee a degree d polynomial, which can be decomposed as a 
function ofm linear functions. That is, there exist linear functions g±, . . . , g m : { — 1, l} n — > K. 
and a degree-d polynomial p : M m — > K suc/i that 

f(x) =p(gi(x),...,g m (x)) 

for all x G {—1, 1}™. Assume that gi, . . . ,g m are normalized such that E[<j^] = . . . = E[<jf] = 
1. T/ien k-wise independence e-fools f(x) for 

k = exp(0(d/e) d ) + po/y((logm • d/e) , m, wt(p)). 

Lower bounds for approximation by PTFs A boolean function g : {—1,1}" — > {—1,1} 
is said to be e-approximated by degree d PTFs, if there exists a degree d PTF h{x) s.t. 
F xeu [h{x) = g{x)} >l-e. 

We prove that functions whose variables have high influence cannot be approximated by 
low-degree PTFs, where the influence of a variable Xi in g is defined as the probability that 
flipping Xi changes the value of g, i.e. 

where is the z-th unit vector. We prove 

Theorem 2. Let g : { — 1, 1}™ — > { — 1, 1} be a boolean function, such that Infj(g) > r for at 
least n a variables. Then for any degree-d polynomial threshold function h we have 

P x [h{x) = g{x)] < l-^ + ?7 

where rj = 0{d/ '(alogn) 1 / 8 ^). 

We illustrate the power of Theorem [2] by showing two examples. The first one shows that 
MOD m function cannot be approximated by low degree PTFs, while the second result shows 
that any low-degree polynomials over F2 cannot be approximated by low-degree PTFs much 
better than the best trivial approximation. Let define the MOD m function as 

A/rnn ( \ _ / 1 E?=i ^ = (mod m) 

MUU m { Xl , . . . ,X n ) - I _ x ^±1^q (mod m ) 

Note that as E {0, 1}, this definition is essentially equivalent to the common one. We 
have the following. 

Corollary 3. Let h : { — 1, l} n — * { — 1, 1} fee a degree-d polynomial threshold function for 
d < O (log log n/ log log log n). Then 

F[h{x) = MOD m (i)] < 1 + o(l). 
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This result is tight in the sense that trivially the MOD m function admits an 1 — — 
approximation by the constant —1 function (which is also a degree-0 PTF). 

Corollary 4. Let q : { — l,l} n — > { — 1,1} be a degree-r polynomial over ¥2 depending on 
all variables. Let h : { — 1, l} n — > { — 1,1} be a degree-d polynomial threshold function for 
d < O (log log nj log log log n). Then 

¥[h(x) = q(x)] < 1 - 2- r + o(l). 

This result is essentially tight, as if q is a product of r linear forms, then the constant 1 
function gives an 1 — 2~ r approximation of q. 

1.1 Tools 

Approximation tools and &>wise independence. Several recent works used the method 
of approximating by real polynomials to show that certain families of functions are fooled 
by /c-wise independent distributions. This method can be described as follows. In order 
to show that /c-wise independence e-fools a certain family of functions, one has to show 
that for every function / in that family, there is a degree k polynomial p\ and degree k 
polynomial p u , such that for every x G { — 1, 1}™ we have pi(x) < f\x) < p u (x), and such 
that K x \p u (x) — Pi(x)] < e. Using this technique, Bazzi [7] proved in a breakthrough paper 
that logarithmic- wise independence fools DNF and CNF formulas. Later, Braverman [TU] 
proved that polylogarithmic-wise independence fools small constant depth circuits, settling 
a conjecture of Linial and Nisan [20] . 

In this work we use the method of approximating polynomials for the problem of fooling 
low degree PTFs. We introduce a general method of obtaining polynomials which are both 
bounding and approximating for any function which depends on a small number of subfunc- 
tions whose tail distribution 'behaves nicely'. In our case we apply it for functions of a few 
linear functions, but we believe that these methods should have independent interest. 

Our starting point is the multidimensional Jackson's theorem, which states that every 
Lipschitz function / on m variables admits an ^-approximation by a degree-c? polynomial, 
where d depends only on e, m and the Lipschitz constant of /. We then use several ad- 
ditional techniques to show that / admits a polynomial approximation p which is a good 
approximation in a multidimensional box near the origin, and above / everywhere. Finally, 
we apply these techniques as well as some concentration and anti-concentration results to 
show that p is a good approximation for /. 

Finally, we apply these techniques to show that any threshold of a function of a few linear 
functions (or a function of a few linear PTF's) can be fooled by /c-wise independence, for k 
that is independent of the number of variables. 

Decision trees and approximation of PTF. Our first tool is a new structural result 
about PTFs. Given a polynomial threshold function p, we show that it has a small set of 
variables, on which most of their possible assignments we obtain a function with no influential 
variable. More precisely, the partial assignments are given by a small depth decision tree. 
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Let D be a decision tree on the variables xi, . . . , x n . Each internal node of D is labeled 
by some variable and has two outgoing edges, corresponding to the possible assignments to 
this variable. The set of leaves of the decision tree correspond to partial assignments to 
the variables. The set of the leaves of D is denoted by L(D), and for any I G L(D) and 
a function f(xi, . . . ,x n ) we denote by f\i the function restricted to the partial assignment 
given by £. For more precise definitions see Section [21 We prove the following result. 

Lemma 5. Let f : { — 1, l} n — ■> R be a degree-d polynomial, and let h(x) = sgn(/(x)). For 
any e, § > 0, there exists a decision tree D of depth at most 2 ed ^ s ■ log(l/e) ; such that 

P, 6 L( D )[Infoc(/W > S] < e 

and 

VteL(D)\bfoo{h\t)>6 , \<e 

for 5' = 0(d-5^ 8d ). 

We sketch the proof of Theorem [21 If a function g approximates a PTF h, then after 
most partial assignments of variables, g still approximates h. We show that under most of 
these assignments, our obtained PTF does not have any influential variable, and therefore 
cannot approximate functions with many influential variables. 

Independently of our work, Diakonikolas et al. [16] and Harsha et al. [19] proved similar 
results. We state their results in our terminology. 

Theorem 6 (Theorem 1 in [IS]). Let f : {— l,l} n — >• R be a degree-d polynomial, and let 
h(x) = sgn(/(x)). For any t > 0, there exists a decision tree D of depth - ■ (cHog \ )°^ such 
that with probability 1 — r over a random leaf i G L(D), the function h\t is either t -close to 
being constant, or has Inf^/i) < r. 

Theorem 7 (Lemmas 5.1 and 5.2 in [19J). Let f : { — 1, l} n — ■> M be a degree-d polynomial, 
and let h(x) = sgn(/(x)). For any t > 0, there exists a decision tree D of depth poly ^° g ( r ) . 
exp(d) such that with probability 1 — r over a random leaf £ G L(D), the function h\e is either 
t -close to being constant, or has Xwi^iK) < t. 

We note that using Theorem [7] instead of Lemma [221 one can get an improvement in the 
dependence on the degree in Theorem [2j In particular, Corollaries [3] and H] hold for degrees 
d < 0(logn/ log logn). 

1.2 Towards fooling low degree PTFs 

We propose a general method for proving that /c-wise independence fools low degree PTFs. 
This is a high level approach and currently we are able to prove only a special case. 

Let / : { — l,l} n — *lbea real function. We say that / is 5- normal if the distribution of 
f(x) over uniform input is 5-close to the standard normal distribution. That is, 

\V*eu[f(x)>t]-V[N>t]\<5 

for any t G R, where iV ~ N(0, 1) is a standard normal variable. In what follows we let f(x) 
be a degree d polynomial, h(x) = sgn(/(x)) a PTF and e > the required error. 
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(i). Reduction to low-influence PTF: It is enough to prove that fc-wise independence 
fools PTFs with small influences. We prove this in Lemma [22] and Claim [T2J The 
important properties of PTFs with low influences is that their distribution is not con- 
centrated around any specific value (see Lemma [T9l) . which can later be used to build 
approximating polynomials for such functions. 

(m). 5-normal polynomials: Assume that f(x) is a degree-c? polynomial with low influ- 
ences which is <5(e)-regular. Then h(x) = sgn(f(x)) is fooled by fc(e)-wise independence. 
This can be proved using the same proof technique of Diakonikolas et al. [13], using 
the approximating polynomials for the sgn functions they construct, when replacing 
the tail bounds for linear polynomials by the normal distribution. 

(Hi). Functions of a few 5-normal polynomials: Assume that f(x) is a degree-c? poly- 
nomial with low influences, which can be decomposed as a function of m polynomials 
gx,...,g m , each is 5(m, e)-normal. Then h(x) = sgn(/(x)) is fooled by k(m, e)-wise 
independence. Our proofs can be slightly altered to prove this, again replacing tail 
bounds for linear polynomials by the normal distribution. This can be also extended 
when allowing a small error term. 

(iv). Regularization of degree-c? polynomials: We conjecture that for every 5, r > 0, 
any degree d polynomial / : { — l,l} n — > M can be regularized in the following way. 
There exist a small number t = t(d, 5, r) of variables x^, . . . , Xi t , and a small number 
m = m(<i, <5, r) of 5-normal polynomials gi,.-.,g m : {—1,1}" — > R, a low-degree 
polynomial p : R m — > M and an error polynomial e : {—1, l} n — > M with ||e||2 < r, such 
that 

f(x) = p(x h , . . .,x it ,gi(x), . . .,g m (x)) + e(x). 

For linear polynomials, this can be proved using the tools of Diakonikolas et al. |13j . 
We were able to prove this conjecture also for quadratic polynomials, and conjecture 
that the same holds for all constant degrees d. 

(v). Putting everything together: Let f(x) be a degree d PTF. We start by reducing it 
to a PTF with low influences using a partial assignment for a small number of variables. 
We use the conjecture to decompose it as a function of a small number of 5-normal 
PTFs, and use this decomposition to prove that fc-wise independence to fool /. 

So where does this fail? The critical point of failure is in the dependence of the number 
of functions m used in the decomposition of /, and the required distance S between their 
distribution and the normal distribution. We can prove that if / can be decomposed into 
a function of m 5-normal functions for small enough 5 then the proof follows through. The 
problem is that 5 has to be very small; in particular 5 < exp(—m b ). On the other hand in 
the regularization conjecture, the number of components m depend on S. We can prove the 
regularization conjecture for quadratic polynomials for m > 1/5 2 . These two requirements 
have no common solution. 
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We note the independently of our work, Meka and Zuckerman [24] constructed an explicit 
pseudorandom generator fooling all degree-<i PTFs. Their construction involves partitions 
the set of inputs into a small number of buckets (using a pairwise independent hash function), 
and then applying k-wise independent distribution to each bucket independently. 

1.3 More related Work 

The study of distributions that fool low-degree polynomials and related functions has received 
considerable attention. For example, fooling linear polynomials over finite fields [25l H], 
which has a numerous number of applications and extensions, pseudorandom generators for 
low degree polynomials [H [2TJ [271 E] and fooling modular sums [22] . 

Bruck [TTJ studied polynomial threshold functions, and proved that such functions can 
be computed by depth-2 polynomial sized circuits with unbounded fan-in linear threshold 
gates. Aspnes et al. [6] studied the approximation of boolean functions by some threshold 
functions. Namely, they study the best possible approximation for the parity function and 
other symmetric functions by low-degree PTF, and proved that for every degree-/c PTF p, 
we have 

W(™-fc-i)/2j (n\ 
¥ x [p(x) + PARITY(x)} > ^= 2n { A 

and this bound is tight. However, their bounds for other functions are not fully explicit and 
are not tight. 

A few recent results consider the problem of constructing pseudorandom generators for 
threshold functions. This problem has a natural geometrical interpretation. Rabani and 
Shpilka [26j provided a construction of e-net for halfspaces. Namely, a set of points S for 
which for every halfspace h that satisfies e < F xe ^i t ijn[h(x) — 1] < 1 — e there are two 
points si,S2 G S such that h(si) = — 1 and h(s2) = 1. The size of their construction is 
polynomial in n and -. [13] proved that any /c-wise distribution fools halfspaces, for k that 
is polynomial in -. Their dependence on k is nearly optimal, as shown by Benjamini et 
al. [8]. 

A subsequent work of Diakonikolas et al. [H] show that /c-wise independence fools 
quadratic threshold functions, and intersections of such functions. 

The rest of our paper is organized as follows. We introduce some preliminary definitions 
and tools in Section [2j This section includes definitions and results that are related to k- 
wise independence, decision trees, concentration of multivariate polynomials and some other 
analytical tools. In Section [3] we present our new structural results on low-degree PTF, and 
present our application that shows that certain functions cannot be approximated by low 
degree PTF. Finally, in Section H] we present our new tools from approximation theory, and 
show that fc-wise independence fools thresholds of functions of a few linear polynomials. 

Throughout this work we do not try to optimize constants. Also, we omit floor and 
ceiling signs whenever these are not crucial. 
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2 Preliminaries 



In this section we provide some necessary definitions that will be widely used throughout the 
work, including definitions and tools related to fc-wise independent distributions, decision 
trees, analytical tools, and concentration bounds for multivariate polynomials. 

2.1 &>wise independent distributions and polynomials 

A distribution D on the boolean cube {— 1, 1}™ is /c-wise independent if the marginal distri- 
bution of any k coordinates is the uniform distribution. There are explicit constructions of 
such distributions of size 0(n^ k ^), and these constructions are essentially optimal [2]. 

Given a class of functions § from the boolean cube to {—1,1}, a distribution D e- fools 8 
if for every ip £ S, we have 

I W xeu { v (x) = 1] - F xeD [<p{x) = l]\<e. 

Combining these two definitions, for simplicity we define the following. 

Definition 8 (fc-wise independence fooling boolean functions). A boolean function / : 
{ — 1,1}" — > { — 1,1} is said to be fooled by /c-wise independence with error e, if for any 
fc-wise independent distribution K, 

\F xeU [f(x) = l]-F xeK [f(x) = l}\<e. 

The following claim is sufficient for /c-wise distributions to e-fool a boolean function. 

Claim 9. Let f : { — 1,1}™ — > { — 1,1}. Assume there are two degree-k polynomials p u ,pi : 
{-1,1}™ -> R such that 

• Pi(x) < f(x) < Pu(%) for all x £ {—1, 1}™. 

• E xeU \p u (x) -pi(x)] < e. 

Then k-wise independence fools f with error e. 

The proof of this claim is simple, and can be found for example in [7j. It is worth noting 
that Bazzi [7] also proved that the condition is necessary using linear programming duality. 

Our next definition extends the notion of fooling boolean functions, and defines it for 
real functions as well. 

Definition 10 (A;- wise independence fooling real functions). Let / : { — 1,1}™ — » R be a 
function. We say that /c-wise distributions fool / with error e, if for any /c-wise distribution 
K over { — 1, 1}™, and any t £ R, 

\V xeU [f{x)<t]-F xeK [f(x)<t]\<e 
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A real function f(x\, . . . degree-d polynomial if it can be represented as 



d 



/w = E £ 



Xi A . . . Xj r 



fc=0 ii<...<i fc G[n] 



A polynomial is multilinear if each variable appears in every monomial at most once. Equiv- 
alently, it can be represented as 

d 



Each function / : { — l,l} n — > R can be uniquely represented by a multilinear polynomial. 
We will interchangeably regard / both as a boolean function and as a multilinear polynomial. 

2.2 Decision trees 

A Decision Tree over binary variables binary tree, where each internal node v 

is labeled by one of the variables x v , such that the labels along any path from the root to a 
leaf are distinct. Also, the two (directed) edges that leave each node are labeled by —1 and 
1. Therefore, given a path P from the root to a leaf, for every variable x that appears along 
the path we can uniquely define a value xp £ { — 1,1} to be the label of the edge in P that 
leaves the node labeled by x. 

A path P from the root to a leaf £ defines a partial assignment Ai by assigning ev- 
ery variable that appears on x by xp. All the variables that do not appear on P remain 
unassigned. 

We denote the set of variables labeling the vertices in the path to £ by var(£). We denote 
the set of leaves of a decision tree D by L(D). 

The depth of a leaf is the length of the path from the root to it, and the depth of a 
decision tree is the maximal depth of a leaf. 

With a slight abuse of notation, we define a random leaf in a decision tree to be the 
result of the following procedure. We start at the root, and at each step we move to one of 
his children, uniformly and independently of the other choices. When we arrive a leaf £ we 
output it. Equivalently, we choose each leaf £ with probability 2 _depth W. 

We now can define the restriction of a function with respect to a certain leaf £ and with 
respect to a decision tree D. 

Definition 11. Let / : {—1, 1}™ — > R be a function, D be a decision tree on x\, . . . , x n and 
i be a leaf in D. We define the restriction of / to £, denoted by f\i, to be the function 
obtained by / after assigning the variables x\, . . . ,x n according to Ag. Namely, the domain 
of f\t is {-1, 1}MW« ; anc i tne range f f\ e is R. 

Similarly, given a distribution D, define its restriction to £, T>\i to be the the distribution 
obtained from D conditioning on the partial assignment Ag. 

We define a random function /|d by choosing a random leaf £ of D and restricting / to 





k=0 ii<...<i k £[n] 
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We will need the following easy claim. 

Claim 12. Let f : {—1, l} n — » R be a function, and D a decision tree, such that 

P^eL(D) [k-wise independent distributions fool f\i with error e] > 1 — 8. 

Then (k + depth(D))-wise independent distributions fool f with error e + 5. 

Proof. Let K be some fc'-wise independent distribution for kl = k + depth(Z)). For any leaf 
£ G L(D), the restriction of K given by £ is fc-wise independent. 

Let £ G £(-D) be a random leaf of D. Say £ is good if fc-wise independent distributions 
fool f\e with error e. By our assumption £ is good with probability at least 1—5. 

Let t G R. For any good leaf we have 

|P* eC /|J/» < *] -f x eK\ e [f(x) <t}\< e. 
For any other leaf we can bound 

I lPW|J/(x) < t] - F xeKU [f(x) < t}\ < 1. 

Hence we get 

I P* 6 t/[/(a:) <t]- F xeK [f{x) <t}\< E teL[D) \ F xeU \ t [f(x) < t] - F xeK \ t [f(x) < t]\ < e + 6. 

□ 

We will also require a bound on the L 2 norm of linear functions, under a partial restriction 
given by a decision tree. 

Lemma 13. Let g : { — 1, l} n — > K &e a linear function with K[g 2 ] = 1. Lei D be a decision 
tree. Then 

W> eeL(D) [E[(g\ e ) 2 ] >t] <3e"'/ 8 . 

Proof. We will need the following variant of the Azuma-Hoeffding inequality. Let Xi , . . . , X n 
be random variables, such that X, = Cj(X 1; . . . , or Xj = — Cj(X 1; . . . , Xj_i), each with 

probability 1/2, where q : { — 1, l}*" 1 — > R is some deterministic function, such that a.s. 
Xf + . . . + Xl < 1. We will prove that 

P[Xx + . . . + X n > t] < e~ t2/2 . 

First we show how we apply this inequality. Let g(x) = a + ajXj where a ! + a = 1- 
Let £ be a leaf of D. Notice that g\e(x) = (a + J2ie va ,re a i x iU) + Sj^varf a « a: '«- Hence, to bound 
the probability that E[(g|^) 2 ] is large, we need to bound the probability that Y^i&axt a i x i\t 
is large. We will assume w.l.o.g that t > 8 since otherwise the required inequality holds 
immediately. 

Define a sequence of random variables X 1; X 2 , . . .. Let i\ be the index of the first variable 
queried by D. Define Xi = ia^. Given the value of x^, let i 2 be the index of the second 
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variable queried by D. Define X 2 = ±Qi 2 - Notice that in fact X 2 = ±c 2 {Xi). Let i% be the 
index of the third variable queried by D, and define X 3 = ±a ia . Again, X 3 = ±c 3 (X 1 , X 2 ), 
and we continue until we reach a leaf. If Xj is a leaf of D, we define the remaining variables 
Xd+i, • • • , X n to be 0. Let X = ^2 Xi. Notice that 

g\ e (x) = (a + X) + diXi. 

i<£var(£) 

Since the conditions of the inequality hold for Xi, . . . ,X n , we get that ¥[X > t] < e - ' 2 / 2 . 
We wish to bound the probability over i £ £(-D) that E[(g|^) 2 ] > t]. If this event occurs, 
then we must have X > \ft — 1. Since we assume t > 8 this gives X > \/t/2, which gives 

F eeL{D) [E[g\ 2 e ] >t]< ¥[X > Vt/2] < e^ 8 . 

We now turn to prove the modification of the Azuma-Hoeffding inequality. Set A > 
to be determined later, and consider E = E[e A( - Xl+ "' +Xn - ) ]. We can decompose E = 
nti E[e A ^ \Xi, . . . , X^]. We have 

E[e XXi \X u . . .,Xi-!] = le^u-M + Ig-Ac^,...,^). 

2 2 

Using the inequality \{e x + e~ x ) < e x2 ^ 2 we get 
Hence 

Thus we get 

P[Xi + ... + X n >t] < e x2/2 ~ xt . 
Setting A — t gives the required inequality. □ 

2.3 Analytical tools 

The Lipschitz constant of a function bounds the change in the function value when the inputs 
are perturbed. It will be convenient for us to measure distance in the L ro norm. Recall that 
for z = (zi,...,z m ) £ M m , its norm is defined as the maximal absolute value of its 
coordinates, i.e. 

Halloo = max-Oil : i £ [m]}. 

Definition 14 (Lipschitz constant). Let F : R m ->lbea function. The Lipschitz constant 
of F, denoted by L(F), is defined as 

HF)= sup ^;>-^">i . 

2',2"£l m \\Z — Z || oo 
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The function F is said to be Lipschitz if L(F) < oo. 

Let C be a convex subset of IR m . The Lipschitz constant of F restricted to C, denoted 
Lc(F), is defined as 

Lc{F) = sup m^p. 

z',z"ec \\z -z "\\oo 

We will use restricted Lipschitz constant only for cubes. 
Definition 15. The cubic e-neighborhood of a point z G M. m is defined as 

C(z,e) = {z' G R m : \\z - ^Hoo < e}. 
For a set S C M m , the cube e-neighborhood of S is defined as 

C(5, £ )=UC( 2 ,£). 



2.4 Tail estimates for polynomials 

In this subsection we prove two results about the concentration of degree-d multilinear poly- 
nomials. The first result gives a tail estimate on the probability that a degree-rf polynomial is 
very large, and the second result provides a lower bound on the probability it is concentrated 
near a certain value. In both results we apply techniques based on hyper-contractivity [23] . 



2.4.1 Tail bounds 

We prove in this subsection a general tail estimate on multilinear polynomials, which holds 
both under the uniform distribution over {—1, l} n and under the standard multi-normal 
distribution. Namely, we show that for any degree-c? multilinear polynomial f(x±, . . . ,x n ), 
the probability that \f(x)\ > t is bounded by exp(— t 2 / d ). We observe that this is tight by 
considering the polynomial obtained by multilinearizing f(x) = (xi + . . . + x n ) d . Our main 
result follows. 

Lemma 16. Let /(xi, . . . ,x n ) be a multilinear degree-d polynomial with E[/ 2 ] = 1. Then 
for every t > 1, 

F x eu[\f(x)\>t}<2~i t2/d 

and 

W*eN[\f(x)\>t}<2-r t2/d . 

Let X be a real random variable. Denote \\X\\ q = (E[\X\ q ]) 1 ^ q . Following the notation 
from [23], we say that X is (2, q, rj) hyper-contractive if for every a G R, 

||a + < ||a + X|| 2 . 

We use the following two theorems from |23j . 
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Lemma 17 (Theorem 3.13 in [23]). If X is uniform on { — 1, 1}, or a standard normal 
random variable N(0,1), then for every q >= 2, X is (2,q,rj) hyper- contractive with rj = 
(<?-l)- 1/2 . 

Lemma 18 (Proposition 3.12 in [23]). Let X be (2, q, rj) hyper- contractive. Let f(x\, . . . , x n ) 
be a multilinear degree-d polynomial. Let Q = f(X\, . . . , X n ) where X±, . . . , X n are i.i.d and 
distributed according to X . Then 

\\Q\U < v~ d \\Qh 

Proof of Lemma ITR Let X be either a uniform random variable over { — 1,1} or standard 
normal random variable N(0, 1). Let Q = f(X\, . . . ,X n ) where X%, . . . ,X n are i.i.d and 
distributed according to X. In either case we have ||Q||2 = Ef/ 2 ] 1 / 2 = 1. Fix q > 2 to be 
determined later. By Lemma [T7] X is (2,q,rj) for rj — (q — l) -1 / 2 . Thus, by Lemma [TBI we 
have 

E xexn [\f(x)\^}<(q-1)^ 2 . 

Thus by Markov's inequality 

P, e WI/(*)|>t d/2 ]< 

Since t > 1 we can set q = t/2 + 1 and get 

V xeX 4\f(x)\>t d / 2 ]<2- td /\ 

Hence we conclude 

F xex 4\f(x)\ >t] <2~T t2/d . 

□ 

2.4.2 Concentration lower bounds 

The main result of this subsection is the following lemma. 

Lemma 19. There exist constants ci, ci > such that the following holds. Let f(xi, . . . , x n ) 
be a polynomial of degree d such that Var[/] = 1. For e > let a = (ci ■ e/d) d and 
r = (c 2 • e/d) 8d . //Infoo^) < r, then for every t 6 R, 

¥ xeu [\f(x)-t\<a]<e. 

We use the following two theorems. 

Lemma 20 (Theorem 2.1 in [23]). Let f(xi, . . . ,x n ) be a multilinear degree d polynomial, 
such that Inf QO (/) < r. Then for every t G K 

I V*eu[f(x) <t}- ¥ xeN [f{x) <t)\< 0{dr l ' m ). 
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The following is an immediate corollary of Theorem 8 in Carbery and Wright [12] , which 
is also stated as Corollary 3.23 in 



Lemma 21. Let /(a?i, . . . ,x n ) be a multilinear degree d polynomial such that Var[/] = 1. 
Then for every t G 1R, 

F xeN [\f(x)-t\< a ]<0(da 1 / d ). 

Proof of Lemma [7P1 Let / be a degree-d polynomial such that Infoo(/) < r. By Lemma l20l 
we have: 

Vxeu[\f(x) -t\<a}< ¥ xeN [\f(x) - t\ < a] + 0{dr 1/8d ). 
By Lemma [211 we have 

^^N[\f\x)-t\<a\<0(da l ' d ) 
Combing the two results we get: 

¥ seU [\f(x) ~t\<a]< 0{d • (r 1 ^ + a 1 ^)). 

Setting a — (ci ■ e/d) d and r = {pi ■ e/d) 8d for some absolute constants ci, c% > we get 

F xeu [\f(x)-t\<a]<e. 

□ 

3 The effect of partial assignments 

We prove in this section that functions with many influential variables cannot be non-trivially 
approximated by low-degree PTFs. The proof depends on a new general structural result for 
polynomials and polynomial threshold functions. We show that for every such function there 
exists a small depth decision tree D, such that f\o has low influence with high probability. 

Lemma 22. Let f : { — 1, l} n — > M be a degree-d polynomial, and let h{x) = sgn(/(x)). For 
every e, 5 > 0, there exists a decision tree D of depth at most 2 ed ^ 5 ■ log(l/e) ; such that 

F eeL{D) [lni 00 {f\ e )>d}<e 

and 

[LofooM/) > 6'} < e 

for 5' = 0{d-5 1 l 



The proof of Lemma [221 appears in Subsection 13.11 

We apply Lemma [22] in order to prove our main result of this section, that functions 
with many influential variables cannot be approximated by low-degree PTFs. We restate 
Theorem [2] for the convenience of the reader. 
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Theorem 23 (Theorem [21 restated). Let g : { — l,l} n — > { — 1,1} be a boolean function, 
such that Infj(g) > r for at least n a variables. Then for any degree-d polynomial threshold 
function h we have 

F x [h(x) = g(x)] <l- T - + ri 

where r] = 0(d/ (a log ra) 1//8d ). 

Before proving Theorem [231 we gi ye a couple of examples for its application. We show 
that low-degree PTFs do not admit a non-trivial approximation for the MOD m function, or 
low degree polynomials over F2. 

Corollary 24 (Corollary [31 restated). Let h : {—1, 1}" — > { — 1, 1} be a degree-d polynomial 
threshold function for d = 0(log log nj log log log n) . Then 

F[h(x) = MOD m (i)l < 1 h o(l). 

m 

Proof. It is straightforward to verify that Infj(MOD m ) = — for all i 6 [n], the proof now 
follows by Theorem [231 □ 

Corollary 25 (Corollary [H restated). Let q : {—1, l} n — > { — 1, 1} be a degree-r polynomial 
over ¥2 depending on all variables. Let h : { — 1, l} n — > { — 1,1} be a degree-d polynomial 
threshold function for d < O (log log n / log log log n) . Then 

F[h(x) = q(x)} < 1 - 2~ r + o(l). 

Proof We will prove Infj(g) > 2 1 ~ r for all i G [n]. Let q(x) = (— l) q '( x '\ where q' : FJ? — > F 2 
and x' G F™ set by x { = (-l) x *. We will in fact show that F[q'(x') ^ q'(x' © ei)} > 2 1 ~ r . 
write q'(x') = x , i q{(x r ) + q^x'). As q[ is a non-zero polynomial of degree at most r — 1, we 
have F[q[(x') = 1] > 2 1 ~ r . □ 

We now return to prove Theorem [231 

Proof of Theorem [HI Let g : { — 1, l} n — > { — 1,1} be a boolean function for which Infj(g) > 
t for at least n' = n a variables. We will provide a lower bound on q = W[g(x) 7^ h(x)], 

Set 5 > and e > to be determined later. Set m = 2 ed /' 5 log 1/e and 8' = 0(d ■ 5 1/8d ). 
Using Lemma [22] we get that there exists a decision tree D of depth at most m, such that 

P*=L(D)Moo(/l|<) >8'} <e. 

In each path in D there are at most m variables. Thus, there exists a variable Xi for 
which Infj(g) > r which appears in at most m/n' of the paths. Equivalently, a random leaf 
I G L(D) assigns a value to x^ with probability at most m/n'. We get 

F[g(x) ^ g(x © e*)] < E teL(D) [P^^x) ^ ^(x © e;)]] + m/n' 

< E ieL(D) [F x [g\z(x) ^ h\ e (x)]+F x [h\i(x) + h\ t {x ® e^} 

+ F x [h\i{x © ^ #| £ (x © e^]] + m/n' 

= 2P[^(ar) ^ h(x)]+E eeL(D) [lnU(h\ e )}+m/n' 

= 2q + 5' + e + m/n' 
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On the other hand, by assumption we have F[g(x) ^ g(x © ej)] > r. Combining the two 
bounds we get that 

P[fl , ( x ) h(x)} = q > — (r — £ — 5' — m/n') 

>^-0(e + d5 1/8d + 2 ed ' & log(l/e)/n') 

Setting 5 = 0(d/ logn') and e small enough (for example e = 1/n') gives 

q = ¥\g{x) ± h{x)} >\~r) 

for77 = 0( (alog rf n)1/8d ). □ 

3.1 Proof of Lemma [22 

The proof of Lemma [22] will be conducted in three steps. First we show that for every 
low-degree polynomial there exists a partial assignment of a small set of variables under 
which we get a polynomial with low influences. We then argue that if a polynomial has low 
influences, then so does its threshold. We then conclude by showing that if there is a single 
good assignment, then by taking larger set of variables we get that most of the assignments 
are good. The first step is accomplished by the following lemma. 

Lemma 26. Let f : {—1, 1}™ — > M. be a degree-d polynomial. For every 5 > there exist a 
set of variables x ii: . . . , x ik and assignments for these variables b^, . . . , b ik £ { — 1,1}, such 
that 

Inf o(/L 1 =& il ,...,a;i fc =& i J < 5 

and k < ed/S. 

Proof. We construct a sequence of assignments for the variables of /, assigning a value to a 
single variable at each step, that will lead eventually to a polynomial f\ Xil =b il ,...,xi k =bi k whose 
influence is bounded by 5. 

Every degree-c? polynomial / can be uniquely represented as 

/(*)= e fin**- 

IC[n],\I\<d iel 

For a > define operator V a (f) to be 

V a (f)= £ |//| 3 (1 + «)'". 

IC[n],\I\<d 

Note that V (f) =E[/ 2 ]. 

Fix a variable Xj, and let f(x) = Xifi(x') + f2(x') where x' = (xi, . . . , Xj+i, . . . , x n ). 
We have f\ Xt=1 = f x + f 2 and f\ Xi= ^ = ~h + / 2 - Notice that V (h) = Inf t (f) ■ V (f). 
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We first claim that 

l(14(/U=i) + WL=-i)) = V a (f) - aV a (fi) 
To prove it, write fi(x') = fi,i Uiei x i and /2OO = E hi ELe/ x i- We 

V a (f\ Xi= i) + V a (fL=-i) = V a (fi + h) + ^(-/i + / 2 ) = 

£(/v + /v) 2 (i + «) |/| + £(-/v + / 2 ,/) 2 (i + = 

2-B/u + /v)( 1 + «) |/| = 

2-(V a (f)-aV a (h)) 
This proves (13. ip . In particular for a = we get 

|(K,(/U=i) + WU=-i)) = ^ (/). 

and for a > we have 

!04(/L=i) + ^ a (/U i= _i)) < V a (f) - a ■ Infi(f) ■ Voif), 

since V a (fi) > Vo(fi) = Infi(f) ■ V (f). 

Define S a (f) = ^A. We next prove that 

min (S a (f\ Xi=1 ),S a (f\ Xi =-i)) < S a (f) - a ■ Infi(f) 
By combining (13 .ip and (13.21) we get 

C (f] _Va(f) _ Va(fk=l) + V«(fL=-l) aVajfl) 

aU> Voif) WL=0 + WU=-i) Voif) - 

( V«(fk=i) VMk=-i) \ , <*Vo(fi) 
iWWWI^i)/ Voif) 
min (S a (f\ Xt= i),S a (f\ Xi= -i)) + a ■ Infi(f) 

Consider the polynomial /. We first bound S a if), 

Voif) 

Note that either Inf 0O (/) < 5, or there exists a variable such that 
min (S a (f\ Xii=1 ), SaCfL^-i)) < ^(Z) - a • 5 
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Consider the restriction f Xil=bii for b h G {-1, 1} minimizing S a (f Xii=bii ). Either Inf (JO (/ x<i =6j J < 
5, or otherwise we could find another variable Xj 2 such that 



mm 



in (5 r a(/|x jl =6 jl ,a! <2 =l),'S'a(/|x j]L =6 il ,a! ia =-l)) < S a (f\ Xii=bil ) - a -5 



Continuing in this fashion, since S a > 0, we must reach after at most k < steps 
a polynomial /|« <1 =6 il ,..., a!<fc =6 tfc such that Inf 00 (/| a; . i=6 . i! ... ):i; . fe=6 .J < 5. Choosing optimally 
a = -pj we get k < e ■ d/5. □ 

We now show that if a polynomial has low influences, then so does its threshold. 

Lemma 27. Let f : {— 1, l} n — > R fre a degree-d polynomial such that Inf tX) (/) = 5. Lei 
= sgn(/(x)). TTien 

Infoo(/i) < 0(d 



Proof. Assume w.l.o.g Var[/] = 1, and we will bound Inf for alH = 1, . . . , n. 

We first argue that if E[/ 2 ] is large, then h has low influences. Let f(x) = c + fo(x), 
where c is the free coefficient of /. We have Var[/] = E[/q] = 1 and E[/ 2 ] = 1 + c 2 . The 
probability that h(x) = h(0) is bounded by 

nHx) = h(o)} < n\fo(x)\ > C ] < -fi = _ 

Thus for large c we get a bound on the influence of h, since 

InU(h) = F[h(x) ^ h(x © e*)] < F[h(x) ^ h(0)} + F[h(x © a) ^ h(0)} < 2/c 2 . 

In particular if c > b^ 1 ^ we get that Infj(/i) < 0{5 1 ^ 2 ) and we are done. We thus assume 
from now on that c < <5 _1//4 . 

Let f(x) = Xifx(x) + f2(x), where fi, f 2 do not depend on Xi. By our assumption on the 
influences, 

E,[/ 2 ] = lnU(f) ■ E[f] < 5(1 + c 2 ) < 25 1 ' 2 . 
Set a = 6 1 / 8 and consider the following two cases. 

(•). |/(x)|<a 
iii). \h(x)\>a 

If neither of these cases occur, then flipping Xi does not change the sign of /. Thus we 
can bound 

InU(h) <P[|/(ar)| <a]+P[|A(x)| >a]. 

We first estimate the first summand. By Lemma UM Set 5 > max(^-a 1//rf , ^■5 1 ^ 8d ) where 
ci, C2 are the constants in Lemma [T9l We get 

P[|/0)| <a] <5 = 0(d-5 1/sd ). 
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We proceed by estimating the second summand. By Markov inequality and get 



P[|/i(x)|>a]< 



E[ff] 



a 2 



Combining the two estimations we get that 



lnii(h) < 0(d-5 1/M ), 



as desired. 



□ 



We next prove Lemma [22j Using Lemma [26] we prove the existence of a small depth 
decision tree, such that for most of its leaves, the polynomial restricted to the leaf has low 
influences. We use Lemma 1271 to argue that when this happens also the threshold function 
has low influences. 

Proof of Lemma\2^ We first prove the theorem for a polynomial /, and then for a PTF h. 
We build a decision tree D in steps. At every step, some of the leaves of D will be open, and 
some will be closed. If a leaf I is closed then Inf 00 (/|^) < 5. A leaf is open if it is not closed. 
Initially, our tree consists a single vertex, the root, which is open. 

Let I be an open leaf, and consider the polynomial f\i. By Lemma I2"6"l there exist a set 
of variables x^ , . . . , Xi k , k < y and an assignment to these variables b^, . . . , b, lk E { — 1,1}, 
such that 



We add under a £ a subtree whose leaves correspond to all the 2 k possible assignments 
of , . . . , Xi k . Note that at least one of the leaves in the new tree is closed, and the other 
leaves may be either closed or open. Therefore, a random walk of length k that starts at i 
will end at a closed leaf with probability at least 2~ k . 

This process defines a tree D' of depth at most n, as every variable appears in every path 
at most once. Let D(t) be the tree obtained by truncating D' at depth t ■ 2 k . Namely, the 
depth of D(t) is t ■ 2 k . The probability that a random walk that start from the root will end 
at open leaf is at most (1 — 2~ fc )* < e~ 2 *. Thus, setting, t = log(l/e) • 2 ed//<5 will guarantee 
that a random leaf in D is closed with probability at least 1 — e, as required. 

We proceed by proving the second item. Let /ibea PTF as stated, and observe that by 
Lemma [271 for any leaf £ for which Inf OQ (f\g) < 5 we have that Inf 0O (sgn(/|^)) < 0(d8 1 ^ 8d ) = 
5'. Since sgn(/| £ ) = sgn(f)\ e = h\ e , we get 




F eeL{D) [Inf 00 (h\ e )>6'} < e. 



□ 
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4 Fooling threshold of polynomials depending on a few 
linear functions 



Recall that the weight of a polynomial G : R m — > R is the sum of the absolute values of the 
coefficients of its monomials, excluding the free coefficient. Our main result in this section 
is Theorem which is stated below. 

Theorem 28 (Theorem [T], restated). Fix e > 0. Let f : {— 1, l} n — > R be a degree-d 
polynomial, which can be decomposed as f(x) = G(g\(x), . . . ,g m (x)) where 

(i) . The functions gi, . . . , g m are linear with E[gf] = . . . = E[<j^J = 1. 

(ii) . G is a degree-d polynomial. 

Then k-wise distributions e -fool sgn(/) fork = exp(0(d/e) d )+poly ((log m-d/e) d ,m,wt(G)). 

The main lemma shows that any multivariate Lipschitz function F admits a polynomial 
p with the following two properties. The polynomial p bounds F from above everywhere, 
and p approximates F in a cube around the origin. 

Lemma 29. Let F : R m — > [—1, 1] be a Lipschitz function. Let A > and < e < 1 6e 
arbitrary. There exists a degree-k polynomial p(z\, . . . , z m ) such that 

(i). For every z G R"\ p(z) > F(z). 

(ii). For every z E [— A, A] rn , p(z) < F(z) +e. 

and k < O ( A - m *y F) ) . 

The proof of Lemma [29l appears in Subsection ?? 

We next apply Lemma [29] to show that /c-wise distributions fool any boolean function 
/ : { — 1,1}" — > R with the following properties. The function / be decomposed as f(x) = 
G(gi(x), . . . ,g m (x)), where g%, . . . ,g m are linear functions, the polynomial G is Lipschitz, 
and the distribution of / is not too concentrated around any specific value. 

Lemma 30. Let f : {— l,l} n — >• R be a function which can be decomposed as f(x) = 
G(gi(x), . . .,g m (x)) where 

(i). The functions gi, . . . , g m : {—1, l} n — > R are linear with E[g^], . . . , E[^] < 1. 

(ii). The function G : R m — > R is continuous and Lipschitz on the cube [—C,C] m , for 
C = 100^1og(m/£). 

(Hi). The function f is anti-concentrated, Wj\f(x)\ < a] < e/100 for some a depending on 
e. 

Then there exists a degree-k polynomial p : { — 1, l} n — > R such that 
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• p(x) > sgn(/(x)) for all x £ {—1, 1}™. 

• E xeU [p(x) ~ sgn(f(x))} < e. 

where k = 0(^^-la(mL/ae)) and L = max(L[_c,C] m {9), !)■ 

The following claim bounds the Lipschitz constant of degree-d polynomials. 

Claim 31. Let G : M. m — > R be a degree-d polynomial. The Lipschitz constant of G on 
[-C,C} m is bounded by dC^ 1 ■ wt(G). 

Proof. We start by bounding the Lipschitz constant of monomials on [— C, C] m . We then 
will get the result for G by the additivity of the Lipschitz constant. 

Let M be a monomial M(z±, . . . , Zd) — Yl z %- Let z, z' £ [— C, C} m such that ||^ — -^'Hoo < £■ 
Let z' i = Zi + e, where |e«| < e. We have 

d / k d k-l d \ 

\M(z')-M(z)\ = \^2[Jl(z i + e i ) J] Zi-Hizi + e^Hzi | < 

k=l \i=l i=k+l i=l i=fe / 

<2 fe-1 d 

y^TT N \zi + ei\ei< 

k=l i=l i=k+l 
dC d - X E. 

Hence L^ C ,C]A M ) < dC^ 1 . 

Write G(z) = J2ic[m] \i\<d a i^i( z ) where Mj are monomials. The Lipschitz constant of 
G on [-C, C] m is thus bounded by £ |a/|L[_ C)C ]m(M/) < rfC^ 1 • iyt(G). □ 

We proceed to the proof of Theorem [281 

Proof of Theorem [28\ . Let /(x) be a degree-<i polynomial, which can be decomposed as 
f(x) = G(g 1 (x), . . .,g m (x)) where g 1 ,...,g m are linear and E[gf\ = ... = E[#£J = 1. Set 
5 = 0(e/d) sd . By Lemma [221 there exists a decision tree D of depth at most exp(0(d sd+l /e 8d ) 
such that 

P, eL(D) [Inf 00 (/|,) >5] <e/100. 
By Lemma [13] we have for each linear function gi 

F eeL(D) [E[(( gi )\ e ) 2 ] >t] <e/100m 

for t = 0(loge/m). Thus with probability 1 — e/100, we have both that Infoof/li) < 5 and 
E[((^)k) 2 ] < * ^ all ie[m\. Fix such t. Since /|^ has low influences, Lemma [T9l gives 

Keu[\f\t(?)\ < a] < £ /1000. 

for a = 0(£/d) d . 

Let g[ be a normalization of {gi)i such that E[(^) 2 ] — 1- We can write fi{x) = 
G' (g[(x) , . . . , g' m (x)) where wt(G') < wt(G) ■ t. By Claim [311 we have L^c,c\ m {G) < 
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dC d ~ x ■ wt(G) for C = 100-y/log(m/100e). Applying Lemma [30] we get there exists a 
degree-/;; polynomial p u (x) such that both p u (x) > sgn(/|^(a:)) for all x G {— 1, l} n , and 
E xGU \p u (x) — sgn(/|^(x))] < e/10. Applying the same reasoning on the polynomial —f(x) 
we get there exists a degree-A; polynomial pi(x) such that both pi(x) < sga(f\i(x)) for all 
x G { — 1, l} n and E x€ [/[sgn(/|^(x)) — pi(x)} < e/10. Combining the two bounds we conclude 
that fc-wise distributions e/10-fool f\%. Since this holds for 1 — s/100 fraction of the leaves 
£, we get by Claim [T21 that k' = k + depth (Z)) independence e-fool /. 

We conclude by bounding k and fc'. We have k = 0{ d ™^ \og(mL/ae)) = 0(d/e) d ■ 
m 5 wt(G) 2 log(m/e) d ■ 0(\og(d ■ m ■ wt(G)/e)), and depth(D) = exp((d/ 'e)°^ d \ hence we have 
k' = exp((d/e) 0( - d ^ + poly (O (log m ■ d/e) d ,m,wt(G)), as claimed. □ 



4.1 Proof of Lemma [29 



Our starting point is a fundamental result in the theory of approximation theory. Roughly 
speaking, it says that any Lipschitz function can be well approximated by a low-degree 
polynomial on a bounded region. Explicitly we use the following result of Ganzburg [17] . 

Lemma 32 (Multidimensional Jackson- type theorem, Theorem 1 in [T7J). Let F : W m — > R 

be a Lipschitz function. For every k there is a degree-k polynomial Pk(zi, ■ ■ ■ ,z m ), such that 

sup \F(z)- Pk (z)\ <C -±-L 

ze[-l,l] m ro 

where C is an absolute constant. 

We get the following corollary. 

Corollary 33. Let F : W m — >• R be a Lipschitz function. For every e > there exists 
k = 0(m 3 / 2 L(F)/e) and a degree k polynomial pk such that 

• Pfc(z) > F(z) for all z G [-1, l] m 

• p k (z) - F(z) < s for all z G [-1, l] m . 

Proof. Let pi, be the polynomial obtained by Lemma 1321 such that sup. C [ l x un \F(z)—p^(z)\ < 
e/2, and take p' k (z) = Pk{z) + e/2. □ 

We also need the following bound on the growth of real polynomials. 

Lemma 34. let g(w) be a univariate degree-k polynomial. Then for every w G R, 



1^(^)1 < ( max ' + V'w 2 — l| fc . 

we[-i,i] 



We will need the following corollary of Lemma 
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Lemma 35. Let p(zi,...,z m ) be a degree-k polynomial, such that p(z) < c for all z G 
[— 1, l] m . If \zi\ > \zj\ for every 1 < i, j < m, then 

\p(z)\ < c ■ max(|2zi| fc , 1). 

Proof. Assume w.l.o.g that \zx\ > \z%\ for every i G {1, . . . , m}. If \z±\ < 1 that (z±, . . . , z m ) G 
[—1, l] m and by assumption p(z) < c. Otherwise consider the following univariate polynomial 
g(w) that is obtained by restricting p to the line passing through zero and z, defined as 

g(w) = p(w,wz 2 /z 1 , . . .,wz m /zi). 

When w G [—1, 1], we have (w, wz 2 /z 1 , . . . , wz m jz\) G [—1, l] m . Hence max M6 [_ 1)1 ] g{w) < 
c. Applying Lemma [3^1 we get that 

\v{z)\ = \g{zx)\ <c-\ Zl + ^zf-l\ k < c ■ \2 Zl \ k . 

□ 

We are now ready to state and prove the main lemma that will be used to prove Lemma[29l 

Lemma 36. Let F : M. m — > [—1, 1] be a Lipschitz function. For every < e < 1 there exists 
a degree-k polynomial p' k such that 

• p' k {z) > F(z) for all z G M m . 

• p' k {z) - F(z) < e for all z G [-1/4, l/4] m . 
where k = 0(m 3 ^ 2 L(F)/e). 

Proof. Let p k be the polynomial guaranteed by Corollary [33] for error e/2. Set k! > 
max(/c, 4m/ e) be an even integer, and define 

P ' k ( Zl , . . . , Zm) = p k (z U ■ ■ ■ , Zm) + 4 ((2xi) fe ' + . . . + (2x m f^ . 

We will prove that p' k (z) > F(z) for all z G M m , and p' k (z) < F(z) + e for z G [-1/4, l/4] m . 

Let z G M m be arbitrary If z G [—1, l] m we already have that p k (z) > Pk{z) > -F'(^). 
Otherwise, assume w.l.o.g that > max(|z 2 |, • • • , \z m \), and hence \z\\ > 1. 

Since approximates F with error e < 1 on [—1, l] m , we have that < 2 for all 

z G [—1, l] m . Applying Lemma [351 we get that 

Pk (z) <2\2 Zl \ k . 

Thus in particular, Pk{z) > — 2|22; 1 | fc . By our definition of p' fc (^) we get that 

Z4(*) =P* W + 4 ((2 Xl ) k ' + ... + {2x m f) 

> -2|2^| fc + 4((2 2l ) fc ' + ... + (2 2m ) fc ') 

> -2|2^i| fc + 4(2zi) fc ' 
= -2|2^| fc + 4|2^| fc ' 

> -2|2^| fc + 4|2 2l | fc 
= 2|2 2l | fc > 1. 
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and in particular we get that 

P'k(z) > F(z). 

We next estimate the obtained approximation of p' k in [— 1/4, l/4] m . Observe that for 
z e [-1/4, 1/4]- 

\p' k (z)-p k (z)\ <4m2~ k ' 

and by our choice of k', we have that \p' k (z) — Pk(z)\ < s/2. Since pk approximates F on 
[— 1, l] m with error e/2, it does so in particular in [—1/4, l/4] m . Hence we get 

max p'i.(z) — F(z) < e. 

2G [-l/4,l/4] m KX ' W ~~ 

□ 

The proof of Lemma [29] now follows as an immediate corollary of Lemma | 



Proof of Lemma\M Let F : R rn — > [-1, 1] be a Lipschitz function. Define = F{z/AA), 

and apply Lemma [361 on F' to obtain a polynomial j4 such that p' k (z) > F'(z) for all z G M m 
and < F\z) + £ for z G [—1/4, l/4] m . The polynomial = p' k {^A ■ z) is the desired 
approximation polynomial for F . The bound on the degree follows from Lemma [36] since 
L(F') =AA- L(F). □ 



4.2 Proof of Lemma 130 



We start with the following definition. 

Definition 37 (zero-set). For G : M m ->Kwe define its zero-set, denoted Z(G) to be 

Z(G) = {z G R m : = 0}. 

Lemma 38. Let G : lR m — > R 6e a continuous real function. For every r > t/jere exists a 
function G : M m — >• [—1, 1] swc/i i/iai 

• G(» > sgn(G(») for all z G M m . 

• For every z <£ C(Z(G),r), G(z) = sgn(G(z)). 

• L(G) < 0(m/r). 
Proof. Set t' = r/2 and define 

G'(z) = max sgn(G(z)) 

z'GC(z,t') 



and 



'I I 'U ' 

z'eC(z,r') 
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First we argue that G(z) > sgn(G(z)) for all z G M. m . Since for every z' G C(z,t'), 
G'(z') > sgn(G(z)). By definition, G(z) is defined as the average of G'(z') over z' G C(z,r'), 
we get that G(z) > sgn(G(z)). 

We continue by showing that G(z) = sgn(G(z)) for z ^ C(Z(G),t). For G C(z,t) we 
have sgn(G(Y)) = sgn(G(z)), since G is continuous and has no zeros in C(z, r). As C(z', r') C 
C(z,t), we have G'(z') = sgn(G(;z)), and hence we conclude that G(z) = sgn(G(z)). 

We next bound L(G). Let z',z" G M m . We consider the following two cases. If \\z' — 
z"\\oo > r ' then since G is bounded, i.e. \G\oo < 1, we have 

\G(z>) - G(z")\ , 
\\z' — z"\\ ~ 

II II 00 

Otherwise, if \\z' — z"\\oo < t', we have 

\G(z>) - G(z")\ ( I G'(t)dt - / G'(t)dt) I < 

\ ZT ) \JteC(z',T') JteC(z",T') J 



1 



't€C(z',T') Jt£C(z",T') 

\G\t)\dt < 



teC(z',T')AC(z",T') 



\C(z>,t>)AC(z",t')\ 

where A denotes the symmetric difference between two sets. 
A straight forward calculation shows that 

\C(z , ,r')AC(z",r')\ <0(m(2r') m ' 1 \W-z"\\ 00 ). 

Hence we get 



\ G ( Z> ) - G ( z ")\ < o(m/r'). 



\z f — z n \\ 

I ^ \\oo 



□ 



Lemma 39. Let f(x) = G(gi(x), . . . ,g m (x)) as in the definition of Lemma IW\ and assume 
that the assumptions of Lemma [221 hold. Then 

F xe{ _ 1>1}n [{ gi {x), . . .,g m {x)) G C(Z{G),t)\ < e/10 

for t = a/L. 

Proof. We consider two cases, the first when (gi(x), . . . ,g m (x)) G [— (C — r),C — r] m , and 
the second when (gi(x), . . . ,g m (x)) [— (C — t),C — r] m . 

In the first case, let x G { — 1,1}™ be such that (gi(x) , . . . , g m (x)) G [— (C — r),C — 
r] m f]C(Z(G), t). We will prove that \f(x)\ < a, and by our assumption the probability 
over all { — 1,1}™ that |/(x)| < a is bounded by e/10. To show that < a, let z = 

(gi(x), . . . ,g m (x)) G M m . z is in L^ distance of at most r from a zero Zq of G, and since 
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z E [-(C -t),C- r] m , we get that z E [-C,C] m . Since G is Lipschitz on [-C, C] m , we 
conclude that 

G(z) < G(z ) + Ly_c,c\™ ■ \\z - z \\oo < Li-c,c]™ ■ t < a. 

We now consider the second case, that (g±(x), . . . , g m (x)) [—(C — r), C — r] m . We will 
bound the probability that this event occurs. By our construction r < 1, hence it is enough 
to bound the probability that (gi(x), . . . ,g m (x)) [— (C — 1),C — l] m , i.e. > C — 1 

for some i E [m]. Since we assumed each gi is 5-normal, we get that 

P[b»(a?)| > C - 1] < 2(5 + F[N > C - 1]) 

where N ~ N(0, 1) is a standard normal variable. Using standard normal estimations and 
setting C = 0(^/\og(m/e)) gives 

W[\9i{x) | > C - 1] < 2(6 + e/lOOm) 

since 5 < e/lOOm we get that P[|^(x)| > C — 1] < e/lOm, and using the union bound over 
all gi, . . . , g m we get that the total error is bounded by e/10. □ 

The following lemma bounds the tail moments of linear functions, and is somewhat similar 
to Lemma 4.2 in [i~3] . 

Lemma 40. Let g : { — 1,1}™ — > M be a linear function with E[g 2 ] = 1. Let c > and 
A > 2c. Then 

E xe{ _ l!l} 4\g(x)\ cA l l9{x)l > A ] < afcAHAW-fa-*? . 

Proof. Define E t = E a;e {_i ) i}n[|5'(x)| Cj4 li<| ff ( :i; )|<i + i. We have to bound E = Y,i>A E i- % 
Hoeflding bound (see, e.g., [5]), 

^u[\9{x)\>i]<2c- l2 l\ 
Hence we get Ei < 2e~ i2/2 (i + l) cA . Therefore 

(i + l) cA < % 2cA = A 2cA ■ (i/A) 2cA < A 2cA ■ e 2ci 
where we used the fact that x < e x for x — i/A. Summing over i > A we get 

E < A ^J2e~ i2/2+2ci = 

i>A 

^2 C A^ e -i(i-2c) 2 +2c 2 < 

i>A 

3A 2cA e 2c2 e-^ A - 2c)2 ). 

_I-2 _!• 2/ 

where we used the fact that J2i>c e 2% — 2~2i>c 2 e 2% — 3e~ c ' 2 . □ 
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We are now ready to prove Lemma 

Proof of Lemma\3Ui Set A > 1 to be determined later. Let G : R m — > [— 1, 1] be the Lipschitz 
function approximating and bounding sgn(G) guaranteed by Lemma [38j Let p : R m — > R 
be the polynomial guaranteed by Lemma [291 approximating G on [—A, A] m with error e/10. 

The degree of p is k\ = 0(— — = A ■ 0(e), where <p(e) = 0( m ^ 2L ) is independent of 
our choice of A. Set p* : {— 1, l} n — > R to be defined as 

p*(x) =p(gx(x),...,g m (x)). 

We have that 

• The polynomial p* is of degree at most A ■ <j)(e). 

• For all x E {-1, l} n , p*(x) > sgn(/(x)). 

• For all x E [—A, A] rn such that (#i(x), . . . , g m {x)) £ C{Z(G),t) we have p*(x) < 
sgn(/(x))+e/10. 

• For all x G [—A, A] m such that (#i(x), . . .,g m (x)) E C(Z(G),r) we have p*(x) < 2. 

To conclude the proof we have to show that E x [p*(x) — sgn(/(x))] < e. We consider three 
ranges of values for x. 

(i). x E {-1, 1}" such that ( gi (x), . . . , g m {x)) E [-A, A] m \ C(Z(G), r). 

(«). xG {-1,1}" such that G [-A^] m nC(2(G),r). 

(m). x E {-1, 1}™ such that (#i(x), . . . ,^ m (x)) ^ [—A, A] m . 



To bound (J), we use the fact that for all x such that (gi(x), . . . ,g m ( x )) £ [— A, A] m \ 
C(Z(G),t) we know that p*(x) — sgn(/(x)) < e/10, hence the total contributed error is 
bounded by e/10. 



To bound (ii), we use Lemma [391 to conclude that the probability over x E { — 1, l} n that 
(<?i(x), . . . , <7 m (x)) G C(Z(G), r) is bounded by e/10. Since we know that for such x we have 
p*(x) < 2 and sgn(/(x)) > —1, we can bound the total error by 3/10e. 



Finally, let £3 be the error in (Hi). Namely, 

£ 3 = E x [(p(gi(x), . . .,g m (x)) - sgn(/(x))) • l( gi ( x ),..., gm ( x ))t[-A,A]™] ■ 
We bound £3 by the union bound over which of gi(x), . . . , g m (x) is maximal. 



£3 < [(^(^(x), . . . ,g m (x)) - sgn(/(x))) ■ l( sl ( x ),... )flm ( x ))^[-A,A]™ • I ffj (x)^ax( ff i(a0,...,flm(a0)] 

i=i 
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Since \p(z)\ < 2 for z G [—1, l] m and |sgn(/(x))| = 1, by Lemma [351 we get 

in 

e 3 < ^E4(2|2^(x)| dc ^ + 1) • l l9iix)l >A] 

i=l 

Til 

< 2 dc g ( P ) +2 ^ Ex [\ gi ( X )\^(P) . l Mx)] > A ] 

1 = 1 

Recall that deg(p) = ki = A ■ 4>(e). Using Lemma HQ] we get the bound 

e 3 < 3m2 ^ e 2cAlnA+2c2 -^ A ^ 2 

where c = (p(e). Recall that (f>(e) > rajs, hence we get that picking A = Jl(clnc) = 
fi(0(e) ln(0(e))) will yield e 3 < e/10. □ 

Acknowledgement. We are grateful to Moshe Dubiner for his great help with approxi- 
mation theory and in particular in proving Lemma 1361 
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